Week 2: Cognitive Perspectives and Introduction to ggplot2

Emorie D Beck

Quick Review

Review

What are the core elements of ggplot2 grammar?

From last week:

  • Mappings: base layer
    • ggplot() and aes()
  • Scales: control and modify your mappings
    • e.g., scale_x_continuous() and scale_fill_manual()
  • Geoms: plot elements
    • e.g., geom_point() and geom_line()
  • Facets: panel your plot
    • facet_wrap() and facet_grid()
  • Themes: style your figure
    • Built-in: e.g., theme_classic()
    • Manual: theme() (legend, strip, axis, plot, panel)

Quick Review

Colorblindness and accessible plots

  • Adding in a colorblind-friendly palette from Wong (2011)
# A tibble: 8 × 3
  name           rgb       hex    
  <chr>          <list>    <chr>  
1 black          <dbl [3]> #000000
2 sky blue       <dbl [3]> #56B4E9
3 bluish green   <dbl [3]> #009E73
4 yellow         <dbl [3]> #F0E442
5 orange         <dbl [3]> #E69F00
6 blue           <dbl [3]> #0072B2
7 vermillion     <dbl [3]> #D55E00
8 reddish purple <dbl [3]> #CC79A7

Part 1: Proportions

Visualizating Proportions

  • Proportions are often important in our research
  • From describing sample-level differences to describing the frequency of behaviors / events / experiences, etc., we often reach toward describing amounts relative to the whole
  • But the goals we are trying to achieve are varied, which necesssitates the use of different graphics

Part 1: Agenda

  • We will cover X kinds of ways of visualizations, all of which were covered in your readings
  • We will cover both when to use them and how to create them
    • Pie Charts
    • Bar Charts (Stacked)
    • Bar Charts (Side-by-Side)
    • Bar Charts and Density Across Continuous Variables
    • Mosaic Plots
    • Parallel Sets

But First, Our Data

  • Today, we’ll use the teaching sample from the German Socioeconomic Panel Study (GSOEP)
  • GSOEP is an ongoing longitudinal panel study that began in 1984 (26 waves of data!)
  • ~20,000 people are sampled each year
  • Samples households in Germany
  • Has additional sub-projects (e.g., innovation studies, migrant panel, etc.)
  • The data are publicly available via application
# A tibble: 360,553 × 9
    year    SID marital chldbrth gender         yearBrth mortality   job   age
   <dbl>  <dbl>   <dbl>    <dbl> <dbl+lbl>         <dbl>     <dbl> <dbl> <dbl>
 1  1999    901       4        0 2 [[2] Female]     1951         0    NA    48
 2  1999   1202       3        0 2 [[2] Female]     1913         0    NA    86
 3  1999   1901       2        0 2 [[2] Female]     1948         0    NA    51
 4  1999   2301       2        0 1 [[1] Male]       1946         0    NA    53
 5  1999   2302       2        0 2 [[2] Female]     1946         0    NA    53
 6  1999   2501       2        0 2 [[2] Female]     1924         0    NA    75
 7  1999   2801       2        0 1 [[1] Male]       1947         0    NA    52
 8  1999   2802       2        0 2 [[2] Female]     1956         0    NA    43
 9  1999 910603       2        0 1 [[1] Male]       1959         0    NA    40
10  1999   2901       2        0 1 [[1] Male]       1932         0    NA    67
# … with 360,543 more rows

Pie Charts

  • You may be wondering if you should ever use a pie chart
  • The answer is, of course, it depends
  • Pie charts are great when:
    • What you want to visualize is simple (e.g., basic fractions)
    • You want to clearly emphasize proportion relative to the whole
    • You have a small data set

Pie Charts

  • In our data, we have a few variables that follow this, but we’ll focus on two:
    • marital status (4 groups)
    • gender (2 groups)
  • ggplot2 doesn’t specifically support pie charts
  • Why? Because it’s a layered grammar of graphics and an explicit function for it would be redundant with some of the built in coordinates
    • specifically, coord_polar()
  • So to make a pie chart, we’ll use geom_bar() + coord_polar()

Pie Charts

Improvements

Pie Charts

More Improvements

Stacked Bar Charts

  • Like pie charts, stacked bar charts have their time and place
  • In particular:
    • Show proportions relative to the total
    • Can be used to show changes over time
  • To demonstrate, let’s look at marriage across emerging adulthood (18-26)

Stacked Bar Charts

Improvements: Color

Stacked Bar Charts

Improvements: Label & Scales

Stacked Bar Charts

Improvements: Legend

Stacked Bar Charts

Improvements: Theme Elements Exercise

  1. Bold axis text and increase size
  2. Bold axis titles and increase size
  3. Bold title and subtitle and center (hint, you will also need to wrap the title text)

Stacked Bar Charts

Improvements: Theme Elements Exercise

  1. Bold axis text and increase size
  2. Bold axis titles and increase size
  3. Bold title and subtitle and center (hint, you will also need to wrap the title text)

Side-by-Side Bar Charts

  • Stacked bar charts are great for showing sequences but can make it difficult to compare within a stack
  • Side-by-side bar charts make it much easier to compare across categories and work well when broken into many categories
  • But they can be difficult to understand across sequences
  • To demonstrate, let’s look at marriage rates across three waves

Side-by-Side Bar Charts

Improvements: Order

Side-by-Side Bar Charts

Improvements: Labels

We could label the bars, but let’s label the axes instead

Side-by-Side Bar Charts

Improvements: Theme Elements

We could label the bars, but let’s label the axes instead

Side-by-Side Bar Charts

Improvements: Colors

Exercise: * Improve the colors by making them: + Colorblind-friendly + Match the goal of the plot (see title)

Time Series and Bar Charts

  • Bar Charts and Density Across Continuous Variables
  • Mosaic Plots
  • Parallel Sets

Part 2: Probability